Suzuki, Daiji - Mathematics of Deep Learning
https://gyazo.com/9eb4355e2d576490e8ed810b29f2d732
hillbig Theoretical analysis of deep learning by Dr. Daiji Suzuki, especially on representation capability, generalization capability, and optimization theory. He covers a wide range of important topics, including the latest Neural Tangent Kernel and dual effect. I don't think there is anything as comprehensive as this in English.
Daisuke Okanohara
I get an error when I access the original Slideshare, I can see it on X/Twitter, cache?
https://gyazo.com/67a306cd676616557e1b7f2aef5f4a46https://gyazo.com/91bb95807b70a093ab6e502e77183bfb
Kolmogorov's addition theorem
universal approximation
Ridgelet conversion
Number of representations and layers
https://gyazo.com/81c47eba5249bf7a67c78a1f9c059646
As an easy-to-understand concrete example, in the case of a function whose value is determined by the distance from the origin, four layers would be of polynomial order with respect to the number of dimensions (I think it's linear, frankly).
Kernel Method and Ridge Regression
regenerative nuclear hillbelt space
I'm redescribing the kernel ridge regression in terms of the idea of a regenerative nuclear Hilbert space, but I'll skip that part.
Deep learning can be interpreted as learning the kernel function itself in accordance with the data.
https://gyazo.com/559eee2c77c2faabcbfb3bb4388510ed
...
https://gyazo.com/80bc5f45abdf998da8cc85537a1eccd5
double-drop
implicit regularization
Generalization error bound
skip this spot
Approximation performance by function class
piecewise smooth function
https://gyazo.com/c00662aa291b6c2f43a9bb31b8971516
mixed-smoothness
https://gyazo.com/ab6fcaef4375e2fd5cb35ac9b67ce914
https://gyazo.com/12f8bab940df517a48c17bc1790f6794
https://gyazo.com/71e2a50728f2efb8eb73288bcf086f6e
kernel ridge regression
adaptive method
deep learning
sparse estimation
I guess if you have too many things to prepare in advance, it becomes impractical.
Bezov space
https://gyazo.com/0e64538bcf2dbcaa22bc9756d52f21cc
https://gyazo.com/b109a40895945675dc80d9a21dd3262d
https://gyazo.com/b8de7edc8b5ffb8bdc6e4645cf40d904
The various function classes mentioned in past discussions are special cases of [Bezov space
https://gyazo.com/1e444edc999820b1513eed7128bc05a0
https://gyazo.com/b6700e9f6e38b23a4a2d998ff3096101
https://gyazo.com/417c9adfc08d6f1968519f715a76e1d9
→Sparsity.
https://gyazo.com/73f00defbc6ede271105ea5fa0fe2372
Deep NN can approximate the source in Besov space
Cardinal B-spline can be well approximated by ReLU-NN
https://gyazo.com/ff1bdd3dac6e15926b05534bd189f812
https://gyazo.com/e81cdac1d099c519afd93056d04d7ca8
Deep learning is superior when spatial smoothness is non-uniform
Mixed-smooth Besov space
Non-probabilistic gradient method takes exponential time to get out of the saddle point.
https://gyazo.com/d7195cb5e0d60b3058f8715d9a77ec60
Neural Tangent Kernel
Mean Field
https://gyazo.com/38423f31f4c743520a0977cece288c06
Watterstein Distance
---
This page is auto-translated from /nishio/鈴木大慈-深層学習の数理 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.